160 PART 4 Comparing Groups
Data from two potentially associated categorical variables is summarized as a
cross-tabulation, which is also called a cross-tab or a two-way table. Because we are
studying the association between two variables, this is a form of bivariate analysis.
The rows of the cross-tab represent the different categories (or levels) of one vari-
able, and the columns represent the different levels of the other variable. The cells
of the table contain the count of the number of participants with the indicated
levels for the row and column variables. If one variable can be thought of as the
“cause” or “predictor” of the other, the cause variable becomes the rows, and the
“outcome” or “effect” variable becomes the columns. If the cause and outcome
variables are both dichotomous, meaning they have only two levels (like in this
example), then the cross-tab has two rows and two columns. This structure con-
tains four cells containing counts, and is referred to as a 2-by-2 (or 2 × 2) cross-
tab, or a fourfold table. Cross-tabs are displayed with an extra row at the bottom
and an extra column at the right to contain the sums of the cells in the rows and
columns of the table. These sums are called marginal totals, or just marginals.
Comparing proportions based on a fourfold table is the simplest example of test-
ing the association between two categorical variables. More generally, the vari-
ables can have any number of categories, so the cross-tab can be larger than
2 × 2, with multiple rows and many columns. But the basic question to be answered
is always the same: Is the spread of numbers across the columns so different from one
row to the next that the numbers can’t be explained away as random fluctuations?
Another way of asking the same question is: Is being a member of a particular row
associated with being a member of a particular column?
In this chapter, we describe two tests you can use to answer this question: the
Pearson chi-square test, and the Fisher Exact test. We also explain how to esti-
mate power and sample sizes for the chi-square and Fisher Exact tests.
Like with other statistical tests, you can run all the tests in this chapter from
individual-level data in a database, where there is one record per participant. But
the tests in this chapter can also be executed using data that has already been
summarized in the form of a cross-tab:»
» Most statistical software is set up to work with individual-level data. In that
case, your data file needs to have two columns for the association you want to
test: one containing the categorical variable representing the treatment group
(or whatever category is on the y-axis), and one containing the categorical
variable representing the outcome. If you have the correct columns, all you
have to do is tell the statistical software you are using which test or tests you
want to run, and which variables to use in the test.